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ABSTRACT 


Batching  is  a  commonly  used  method  for  calculating  confidence  in¬ 
tervals  on  the  mean  of  a  sequence  of  correlated  observations  arising 
from  a  simulation  experiment.  Several  recent  papers  have  considered  the 
effect  of  using  too  many  batches.  The  use  of  too  many  batches  fails  to 
satisfy  assumptions  of  normality  and/or  independence,  resulting  in  in¬ 
correct  probabilities  of  the  confidence  interval  covering  the  mean. 

~~  ^This  paper  considers  the  effects  of  using  fewer  batches  than  are 
necessary  to  satisfy  normality  and  independence  assumptions.  Using  too 
few  batches  results  in  1)  correct  probability  of  covering  the  mean,  2) 
an  increase  in  expected  half  length,  3)  an  increase  in  the  standard  de¬ 
viation  of  the  half  length,  and  4)  an  increase  in  the  probability  of 
covering  incorrect  values  of  the  mean  (analogous  to  Type  II  error  in  hy¬ 
pothesis  testing).  These  effects,  quantified  here,  are  shown  to  be 
small  when  at  least  eight  to  ten  batches  are  used,  with  least  effect  on 
confidence  Intervals  having  low  confidence  values.  With  the  effects  of 
using  too  few  batches  quantified,  a  simulation  practitioner  can  make  the 
trade-off  between  the  ease  of  using  very  few  batches  with  known  indepen¬ 
dence  and  normality  versus  using  a  batching  algorithm  to  squeeze  some 
remaining  information  from  the  data.  For  researchers  developing  batch¬ 
ing  algorithms,  the  results  are  useful  In  selecting  initial  batch  sizes. 
The  results  may  also  be  useful  in  the  context  of  using  Independent  re¬ 
plications  to  establish  confidence  intervals  on  the  mean. 

v’  c 


Finally,  son*  criteria  and  a  procedure  are  suggested  for  Monte  Car¬ 
lo  comparison  of  confidence  Interval  procedures.  These  suggestions  are 


1 .  INTRODUCTION 


The  determination  of  confidence  intervals  on  the  Mean  of  a  process 
arising  from  simulation  experiments  has  been  a  problem  of  long  standing 
interest  for  computer  simulation  practitioners  and  researchers.  Five 
approaches  have  evolved:  independent  replications/  batching/  regenera¬ 
tion/  autoregressive  representation/  and  spectral  analysis;  as  dis¬ 
cussed/  for  example/  in  Fishman  C33/  Kleijnen  C43  and  Law  and  Kelton  C8, 
93.  We  discuss  only  the  first  two  here/  with  emphasis  on  batching. 

Consider  observations  X<|'X2'*aa'Xn  from  a  simulation  experiment. 
We  assume  that  the  output  is  a  covariance  stationary  process;  i.e./  all 
initial  transient  effects  have  been  removed.  Let  u  denote  the  process 
mean/  and  let  Rh  denote  the  h  lag  covariance  E{(Xj-w)(X.+h-u)>.  We  as¬ 
sume  that  Rh  <  Rh+j  for  j  *  1/2/....  The  variance  of  the  process  is  Rq/ 
also  denoted  in  this  paper  as  The  h  lag  autocorrelations  are 


ph  *  Rt/R0*  The  P°int  estimator  of  v  considered  here  is  the  sample  mean 
n  , 

"JT  *  I  X./n/  which  has  expected  value  EflP*w  and  variance  V<X>*ccr/ri/ 
1*1  1  n 

where  c  *  1  ♦  2  £  (1-(h/n))P.  is  the  number  of  correlated  observations 

h=1  n 

containing  the  same  information  as  one  Independent  observation. 

Batching/  discussed  as  early  as  1963  by  Conway  ZH,  is  a  conceptu- 
ally  straightforward  method  for  computing  confidence  intervals  on  v>  by 
transforming  correlated  observations  into  fewer  (almost)  Independent  and 


(almost)  normally  distributed  observations.  Define  the  k  batch  means 
_  1m 

X*  *  £  X./m  for  1*1/2/. ../k;  where  m  is  the  batch  size  n/k. 

1  j«(i-1)m+1  3 


We  assume  either  that  the  problem  of  n/k  not  being  integer  is  insignifi¬ 
cant  or  that  n/k  is  integer.  The  mean  of  each  batch  mean  is  and 

k  1 

the  sample  variance  is  S?=(  z  -  k7^)/(k-1).  If  k  is  chosen  small 

K  i=1  1 

enough  that  the  dependence  between  the  batch  means  and  the  nonnoraality 
of  the  batch  means  are  negligible/  then  Sjvk  is  an  unbiased  estimator  of 
V<!x}  and  a  valid  (1-«)100%  confidence  interval  on  m  isY^H^.  Here 
H.  *  t  .  .Sl/v/Tc  is  the  half  length  of  the  confidence  interval/  with 

K  ®/ I  K 

ta/2  k-1  denoting  the  1-(a/2)  quantile  of  the  t  distribution  with  k-1 
degrees  of  freedom. 

The  primary  question  facing  both  practitioners  and  researchers  is 
the  selection  of  the  appropriate  number  of  batches  k.  If  the  only  goal 
were  to  have  a  confidence  interval  which  has  a  probability  of  1-<*  of 
covering  the  mean/  then  k=2  would  always  be  optimal/  since  the  two 
batches  would  each  contain  the  longest  possible  number  of  observations 
(allowing  the  central  limit  theorem  to  create  normality)  and  would  tend 
to  be  less  correlated  (since  the  observations  in  the  batches  are  farther 
apart)/  thereby  best  satisfying  normality  and  independence  assumptions. 
However/  other  measures  of  goodness  are  important.  Probably  the  most 
used  criterion/  other  than  probability  of  coverage/  is  the  expected  half 
length  of  the  confidence  interval/  E<Hk>.  The  loss  of  information  which 
occurs  with  the  extreme  batching  associated  with  k*2  causes  ECH^}  to  be 
much  larger  than  if  more  batches  are  used/  as  seen  in  Section  2.  In 
general/  the  more  batches  used/  the  less  information  lost  and  the  short¬ 
er  the  expected  half  length.  Thus  there  is  a  tradeoff  between  expected 
length  and  coverage  which  makes  the  selection  of  number  of  batches  dif¬ 


ficult 


We  examine  here  the  penalty  of  using  fewer  batches  than  are  neces- 
sary  to  satisfy  normality  and  independence  assumptions.  When  k  is 
smaller  than  necessary,  normality  and  independence  still  hold,  but  the 
loss  of  information  causes  deteriorating  performance  of  the  confidence 
intervals.  In  addition  to  measuring  this  deterioration  of  the  perfor¬ 
mance  by  the  probability  of  covering  u  and  the  expected  half  length 
E<Hk>,  two  other  measures  are  suggested  here:  standard  deviation  of  the 
half  length,  V^TCH^,  and  the  probability  of  covering  points 

#  w.  The  variance  of  the  half  length  is  important  since  a  confidence 
interval  procedure  with  high  variance  gives  false  signals  as  to  the  ac¬ 
curacy  of  the  estimate  on  a  large  fraction  of  the  simulation  runs.  The 
probability  of  covering  points  which  are  not  the  true  mean  is  analogous 
to  type  II  error  in  hypothesis  testing  —  the  lower  the  probability  the 
better  the  procedure.  Curves  analogous  to  operating  characteristic 
curves  are  the  subject  of  Section  3.  Section  2  gives  properties  of  the 
half  length  Hfc.  Section  A  discusses  implications  of  the  results  of  Sec¬ 
tions  2  and  3  for  both  practitioners  and  researchers.  Section  5  sug¬ 
gests  that  the  probabilities  of  coverage  be  used  to  compare  confidence 
interval  procedures,  analogous  to  comparing  alternative  tests  of  hy¬ 
pothesis  with  operating  characteristic  curves. 

2.  PROPERTIES  OF  THE  HALF  LEN6TH 

To  discuss  the  effects  of  too  few  batches,  we  need  to  first  estab¬ 
lish  a  base  point  for  comparison.  Let  k*  and  a*  denote  the  number  of 
batches  and  batch  size,  respectively,  that  are  necessary  for  the  nonnor¬ 
mality  of  the  batch  means  and  the  dependence  of  the  batch  means  to  be 
negligible.  Establishing  values  for  these  quantities  is  difficult,  but 
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for  our  purposes  they  need  not  be  actually  determined.  For  simplicity 
in  the  following  analysis,  we  assume  that  k*  batches  are  sufficient  to 
provide  exact  normality  and  independence  and  therefore  exact  1-a  level 
confidence  intervals.  This  assumption  of  exact  normality  and  indepen¬ 
dence  is  relaxed  at  the  end  of  Section  4. 

The  assumption  that  decreases  as  h  increases,  made  in  Section  1, 
implies  that  using  fewer  than  k*  batches  will  also  result  in  normality 
and  independence.  This  is  usually  a  valid  assumption,  but,  for  example, 
Schriber  and  Andrews  Cl 2D  consider  sequences  of  independent  trivariate 
normal  observations  for  which  the  results  of  this  paper  do  not  hold, 
since  there  m*  *  3  satisfies  normality  and  independence  exactly,  but  m  = 
4,  5,  7,  8,  10,...  do  not. 

Consider  the  expected  half  length  resulting  from  k  batches.  E{Hk> 
is  inversely  proportional  to  the  square  root  of  the  sample  size  n  when 
observations  are  independent,  and  even  for  correlated  observations  a 
quadrupling  of  the  sample  size  will  cut  the  expected  half  length  to 
about  one-half  its  original  value.  The  same  is  not  true  for  the  number 
of  batches,  k,  when  n  remains  constant.  This  is  because  Sk,  the  stan¬ 
dard  deviation  of  the  k  batch  means,  is  a  function  of  k.  Similar 
results  are  true  of  the  variance  of  the  half  length,  VCHk>.  Neverthe¬ 
less,  changing  k  does  affect  these  properties,  as  shown  in  Table  1. 

Table  1  shows  E<Hk>  and  >/V<Hk>  for  k  =  2,  3,  4,  5,  6,  10,  30,  61, 
121,  and  •  and  for  a  *  .10,  .05,  and  .01.  The  units  are  vix}  *  co/>/n, 
making  the  tables  valid  for  all  values  of  u,  a  and  n.  The  correlation 
structure  between  the  batches  is  not  a  factor  so  long  as  k  £  k*,  since 
then  the  batch  means  are  independent  and  normally  distributed.  The  as- 


sociated  t  distribution  quantiles  are  shown,  as  well  as  the  dimension* 
less  bias  ratio  r  =  upon  which  the  other  quantities 
depend.  The  values  in  Table  1  are  derived  in  Appendix  A.  Note  that  the 
values  are  deterministically  calculated,  rather  than  being  the  result  of 
Monte  Carlo  experiments. 

Table  1  about  here 

As  expected,  E<Hk>  decreases  monotonically  as  k  is  increased  for 
all  values  of  <*.  The  rate  of  decrease  is  much  larger  for  small  values 
of  k  than  for  large  values.  In  fact,  the  decreases  in  E<Hk>  associated 
with  increasing  k  from  ten  to  infinity  is  only  about  twelve  percent  for 
a=.05.  The  correct  comparison  is  not  between  ten  and  infinity,  however, 
but  between  ten  and  k*,  since  more  batches  than  k*  do  not  result  in 
valid  confidence  intervals.  The  decrease  in  length  is  about  ten  percent 
when  k*=61  and  about  eight  percent  when  k*=30. 

Although  the  expected  half  length  is  robust  for  k  >_  10,  the  stan¬ 
dard  deviation  exhibits  a  different  pattern.  While  also  decreasing  as  k 
increases,  and  decreasing  more  rapidly  for  small  values  of  k  then  for 
large  values  of  k,  VV< Hfc>  is  affected  more  by  k  than  is  EtH^  indicat¬ 
ing  that  the  stability  of  the  confidence  interval  associated  with  less 
variance  may  be  a  reason  to  exert  more  effort  to  use  many  batches.  How¬ 
ever,  again  k  should  not  be  compared  with  the  limiting  results  at  infin¬ 
ity,  but  rather  with  k*.  For  «=.05  and  k=10,  the  variance  'is  decreased 
about  48  percent  if  k*=30  and  about  65  percent  if  k*=61.  Thus  the  major 
benefit  of  using  more  than  ten  batches  may  be  more  in  reducing  VCH^} 


than  in  reducing  E<H)(> 


3.  PROBABILITIES  OF  COVERAGE 

Although  consideration  of  the  moments  of  the  half  length  are  intui¬ 
tively  appealling,  a  more  comprehensive  criterion  is  to  compute  the  pro¬ 
bability,  that  the  confidence  interval  covers  Rj,  as  a  function 

of  Rj,  i.e.,  etRj)  =  P()f  -  Hk  <  u1  <  7  +  H^).  When  =  v  this  is  the 
commonly  considered  probability  of  coverage  of  the  mean,  analogous  to 
one  minus  the  probability  of  type  I  error  when  testing  hypotheses.  When 
u,  *  u  this  probability  is  analogous  to  the  type  II  error.  The  coverage 
function  is  more  comprehensive  than  the  expected  value  and  variance  of 
the  half  length  because  the  coverage  function  considers  the  performance 
of  7  and  Sk  together  while  the  half  length  is  a  function  of  only.  In 
addition  the  coverage  function  is  directly  related  to  the  final  product 
—  covering  the  mean  of  the  process. 

We  are  interested  in  calculating  the  probability,  B(Rj),  of  cover¬ 
ing  Rj  as  a  function  of  v,  a2,  n,  o  and  k  when  k  <  k*.  Results  for 
a  =  .10,  .05  and  .01  are  shown  in  Figures  1,  2,  and  3,  respectively. 

Each  figure  is  valid  for  all  values  of  u,  a2  and  n  by  plotting  the 

2  1  /2 

probabilities  of  coverage  as  a  function  of  ^  =  | M^-vl (n/(c®‘ >>  •  The 

derivation  of  the  values,  based  on  the  noncentral  Student's  t  distribu¬ 
tion,  is  given  in  Appendix  B. 

Figures  1,  2  and  3  about  here 

For  k  £  k*,  the  following  patterns  emerge  from  Figures  1,  2,’ and  3: 

1.  The  probability  of  covering  v  (corresponding  to  5=0)  is  (1-a) 

for  k  =  2,  3,  ...,  k*. 

2.  The  decrease  in  8(R|)  due  to  incrementing  k  by  one  decreases  as 


k  increases. 
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3.  The  decrease  in  8<u^)  due  to  incrementing  k  by  one  decreases  as 

a  increases. 

4.  The  decrease  in  8<u^)  due  to  incrementing  k  by  one  is  small 

when  6  <  1 . 

5.  The  decrease  in  6(u^>  due  to  incrementing  k  by  one  is  indepen¬ 

dent  of  n,  other  than  that  k*  increases  with  n. 

As  with  moments  of  the  half  length  of  the  intervals,  it  is  impor¬ 
tant  here  to  distinguish  between  the  effect  of  increasing  the  sample 
size  n  and  increasing  the  number  of  batches  k.  For  an  increase  in  the 
number  of  observations  n,  there  is  a  corresponding  increase  in  G  by  a 
factor  of  and  a  corresponding  decrease  in  the  probability  of  cov¬ 

ering  any  point  other  than  u^=w.  The  effect  of  increasing  the  number 
of  batches  k  is  much  less.  For  example,  when  a=.05  and  k=10  tripling  6 
from  6=1  to  6=3  (i.e.,  increasing  n  by  a  factor  of  eight)  reduces  the 
coverage  from  .87  to  .23.  However  tripling  the  number  of  batches  from 
k=10  to  k=30  reduces  the  coverage  from  .86  to  .83  when  6=1  and  from  .23 
to  .18  when  6=3. 


4.  IMPLICATIONS 

The  results  of  Sections  2  and  3  quantify  the  effects  of  using  too 
few  batches.  Here  we  discuss  the  implications  these  results  have  for 
both  practitioners  and  researchers  interested  in  developing  algorithms 
to  determine  the  number  of  batches  to  be  used. 

When  running  a  simulation  experiment,  the  practitioner  faces  two 
constraints:  (1)  The  run  must  be  long  enough  to  provide  the  desired  ac¬ 
curacy  and  (2)  the  run  must  be  long  enough  to  calculate  a  valid  confi¬ 
dence  interval.  (For  a  related  discussion,  see  Lavenberg  and  Sauer  C5, 
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p.  5553.)  Now  if  the  Latter  constraint  comes  into  play,  then  using  a 
small  number  of  batches  will  allow  the  simulation  run  to  be  terminated 
earlier  than  if  more  batches  are  demanded,  while  still  resulting  in  a 
confidence  interval  with  correct  coverage.  The  results  of  the  last  two 
sections  can  be  used  to  determine  the  penalty  for  using  a  specific 
(small)  number  of  batches. 

Probably  a  much  more  common  situation  is  when  the  accuracy  required 
forces  the  run  to  be  longer  than  the  number  of  observations  necessary  to 
establish  a  valid  confidence  interval.  The  results  of  Sections  2  and  3 
show  that  there  is  seldom  a  need  to  use  more  than  k=30  batches  and  that 
k=10  batches  contain  almost  all  the  information  in  the  data  regardless 
of  the  number  of  observations  n.  Thus  the  practitioner  should  seldom 
exert  much  effort  to  increase  the  number  of  batches  when  k>30,  and  hard-* 
ly  ever  when  k>60. 

The  implication  for  researchers  interested  in  constructing  algo¬ 
rithms  for  determining  batch  sizes  is  to  place  less  emphasis  on  obtain¬ 
ing  very  large  numbers  of  batches.  Four  batching  algorithms  have  ap¬ 
peared  in  the  literature:  Law  and  Carson  L71,  Mechanic  and  McKay  C10D, 
Fishman  121,  and  Schriber  and  Andrews  Cl  2D .  We  discuss  the  implications 
of  the  results  of  Sections  2  and  3  on  each. 

Law  and  Carson  require  a  minimum  of  k=40  batches.  This  research 
shows  that  a  minimum  of  k=10  batches  will  be  almost  as  effective  and 
result  in  shorter  runs.  The  algorithm  could  be  modified  to  try  initial¬ 
ly  for  40  batches,  but  before  doubling  n,  k=20  batches  could  be  checked. 
If  k=20,  fails,  then  try  k=10.  If  k=1Q  batches  fail,  then  double  the 
sample  size  n,  keeping  k=10,  since  it  appears  that  the  first  constraint 
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above  applies.  Similar  comments  apply  to  Mechanic  and  McKay  who  use  a 
minimum  of  k=25  batches. 

Fishman's  algorithm  requires  k  8,  not  because  of  considerations 
concerning  the  interval,  but  because  the  test  used  to  detect  correlation 
between  the  batch  means  fails  for  small  values  of  k.  This  algorithm  be¬ 
gins  with  k=n  and  iteratively  halves  k.  Samples  of  size  n=2048,  4096, 
8192  and  16384  are  used  for  experimenting  with  the  algorithm,  and  n=111, 
716  is  discussed  as  being  necessary  in  not  unreasonable  situations.  The 
results  of  Sections  2  and  3  indicate  the  initial  value  of  k  could  be 
substantially  smaller  with  almost  no  deterioration  in  the  confidence  in¬ 
tervals. 

Schriber  and  Andrews  modify  Fishman's  algorithm  by  considering 
every  possible  batch  si2e  yielding  k  8  and  selecting  the  value  of  k 
corresponding  to  the  test  statistic  least  indicating  correlation  between 
the  batch  means.  Their  algorithm  could  be  modified  to  consider  all  pos¬ 
sible  batch  numbers  between  eight  and  some  value  between  thirty  and  six¬ 
ty.  Rather  than  selecting  the  number  of  batches  with  the  test  statistic 
value  closest  to  zero,  the  algorithm  could  be  modified  to  consider  the 
advantages  of  larger  values  of  k.  For  example,  if  k=8  is  indicated,  but 
k=30  also  easily  passes  the  correlation  test,  then  k=30  should  be  con¬ 
sidered  because  of  its  better  properties. 

Another  implication  concerns  calculating  confidence  intervals  on 
the  mean  based  on  the  use  of  independent  replications,  for  which  nota¬ 
tions  similar  to  batch  means  may  be  defined.  Let  T.  denote  the  sample 

average  of  the  i*^1  simulation  run  having  m  observations,  i=1,  2,  ...,  k. 

k 

Then  the  point  estimate  of  y  resulting  from  the  k  runs  is  T  =  z  T./k 

i=1  1 
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n 

■  I  X./n.  A  confidence  interval  may  be  calculated  based  on  Y  and  S. 
i=1  1 

exactly  as  with  batch  means.  For  a  detailed  comparison  of  replications 
and  batch  means,  see  Law  C63. 

Since  each  replication  has  an  associated  overhead  of  initializing 
the  run  and  removing  the  initial  transient  effects,  k=2  replications 
have  an  advantage  compared  to  using  more  replications.  The  m=n/2  obser¬ 
vations  per  run  give  Y^  and  Yg  the  best  chance  of  being  normally  distri¬ 
buted.  Independence  is  guaranteed  for  all  values  of  k  by  using  dif¬ 
ferent  random  number  seeds.  Therefore  k=2  minimizes  computation  and  has 
the  best  chance  of  satisfying  the  assumptions  necessary  for  obtaining 
the  desired  level  of  coverage.  However,  the  resulting  confidence  inter¬ 
vals  have  a  larger  expected  half  length  than  when  larger  values  of  k  are 
used.  Thus  a  trade  off  between  initialization  cost  and  information  loss 
must  be  made,  just  as  with  batch  means. 

The  results  of  Sections  2  and  3  imply  that  every  effort  should  be 
made  to  use  at  least  eight  to  ten  replications,  but  that  little  gain 
results  from  using  many  more  than  ten  or  twenty  replications.  Since  in¬ 
itial  transients  are  often  a  major  factor  when  using  independent  repli¬ 
cations,  our  recommendation  is  to  use  more  than  ten  batches  only  when 
the  cost  of  dealing  with  the  initial  transient  effect  is  very  small. 
This  is  a  very  general  recommendation,  but  hopefully  the  quantification 
of  effects  in  Sections  2  and  3  will  be  useful  for  practitioners  when 
making  the  tradeoff  between  few  and  many  replications. 

A  final  implication  concerns  the  lack  of  knowledge  about  k*,  the 
number  of  batches  for  which  nonnormality  and  dependence  of  the  batch 
means  are  negligible.  In  the  analysis  of  Sections  2  and  3  we  assumed 
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that  k*  batches  were  sufficient  to  provide  normality  and  independence 
exactly.  Since  in  most  simulations,  some  violation  of  the  assumptions 
occurs  for  all  batch  sizes,  there  is  some  advantage  to  using  smaller 
numbers  of  batches  which  is  not  reflected  in  the  analysis.  That  advan¬ 
tage  is  that  the  smaller  number  of  batches  will  more  closely  satisfy 
normality  and  independence,  thereby  yielding  more  exact  coverage  proba¬ 
bilities  for  the  mean. 

5.  COMPARING  CONFIDENCE  INTERVAL  PROCEDURES 
The  analysis  of  coverage  functions  in  Section  3  leads  to  a  general 
method  for  comparing  confidence  interval  procedures.  Just  as  two  tests 
of  hypothesis  can  be  compared  by  calculating  operating  characteristic 
curves,  procedures  for  calculating  confidence  intervals  can  be  compared 
by  empirically  estimating  the  coverage  function  for  various  values  of  <* 
and  .  Before  stating  the  empirical  procedure  explicitly,  first  con¬ 
sider  Figure  4,  which  summarizes  the  information  in  Figures  1,  2,  and  3 
for  k=10  and  k=«.  Contours  of  the  coverage  probability  B  are  plotted  as 
functions  of  a  and  6.  (Recall  that  4  and  differ  only  in  location  and 
scaling,  so  alternatively  the  6  axis  may  be  thought  of  a  rescaled 
axis.)  The  solid  curves  corresponding  to  k  =  •  are  all  lower  than  the 
dashed  curves  corresponding  to  k=10,  indicating  that  the  larger  numbers 
of  batches  are  preferable,  as  long  as  the  normality  and  independence  as¬ 
sumptions  hold.  Also  as  long  as  these  two  assumptions  hold, ’the  contour 
curves  intersect  the  a  axis  at  1-B.  This  property  corresponds  to  the 
coverage  function  described  by  Schruben  C133,  who  suggests  that  confi¬ 
dence  interval  procedures  be  compared  by  empirically  determining  the 
probability  of  coverage  of  the  mean  as  a  function  of  a.  Thus  the 


suggestion  made  here  is  to  generalize  Schruben's  coverage  function  to  be 
a  function  of  as  well  as  a. 

The  Monte  Carlo  estimation  of  the  coverage  function  is  straight¬ 
forward.  Let  <k,  j=1,  2,  ...,  J,  denote  the  a  values  of  interest,  such 
as  .2,  .1,  .05,  and  .01.  Let  8t,  1=1,  2,  ...,  L,  denote  the  8  values  of 
interest,  say  .7,  .8,  .9,  and  .95.  Perform  R  replications.  In  replica¬ 
tion  i,  calculate  the  J  confidence  intervals  (v.^,  w^j)  using  the  pro¬ 
cedure  of  interest,  and  store  these  values  either  explicitly  or  in  2J 
histograms.  Then  for  j=1,  2,  ...,  J  and  1=1,  2,  ...,  L;  estimate  v. , 

j  * 

such  that  P(V.  <  v. ,  <  W-)  =  8,,  where  V.  and  W;  are  random  variables 

denoting  the  lower  and  upper  bounds  of  the  confidence  interval.  The 

confidence  contour  corresponding  to  each  8t  can  be  plotted  using  the 
points  <Hjt,  ou),  j=1,  2,  ...,  J.  Note  that  there  are  two  values  of 

Ujt,  one  less  that  u  and  one  greater  than  u,  as  shown  in  Figure  5,  but 

that  symmetry  allows  Figure  4  to  show  only  the  greater  value. 

The  estimation  of  n. ,  involves  the  estimation  of  quantiles,  which 

J  * 

is  a  bit  more  involved  than  the  estimation  of  fractiles.  If  the  results 
are  not  to  be  plotted,  the  above  procedure  can  be  modified  to  simply  in¬ 
crement  a  counter  Cjffl  for  each  replication  that  confidence  interval  j 
covers  um,  where  Mm,  m=1,  2,  ...,  M,  denote  the  values  of  w1  of  in¬ 
terest.  Then  the  estimator  for  Bjm,  the  probability  of  a  (1-<yi00X 
confidence  interval  covering  u  is  c-  /R. 


APPENDIX  A 


We  derive  here  the  bias  ratios  r,  the  expected  half  lengths  E<Hk>, 
and  the  standard  deviations  of  the  half  lengths  needed  for  Table 

1.  When  k£  k*,  the  assumptions  of  independence  and  normality  of  the 
batch  means  are  satisfied.  Then  Sk  has  a  chi  distribution  with  mean 
E<$k>  =  ak(2/(k-1))^2  r(k/2)  /  r((k-1>/2),  where  is  the  standard  de¬ 
viation  of  the  k  batch  means  and  r(*)  denotes  the  gamma  function.  Also 
when  k  <_  k*,  o^/vlc  3  v/VOfr.  Since  by  definition  r  *  E{SkV^7 >/VGD> 
we  have  r  -  <2k/(k-1>)^2  r(k/2)  /  r((k-1)/2),  which  is  dimensionless. 

The  expected  half  length  is  E<Hk>  =  to/2  k_1  E<Sk>/  >/1T.  From  the 
definition  of  r,  we  have  directly  that  E{Hk>  =  ta/2  r  in  units  of 

vva>. 

The  variance  of  the  half  length  is  VtH^}  *  E{H2>  -  E2{Hk>  * 

t«/2,k-1  EtSkJ  1  k  “  lta/2  r  >/VOT»2.  Recalling  that  E<S2>  * 
for  all  k  <  k*  and  taking  the  square  root  to  obtain  the  standard  devia¬ 
tion  yields  ^V<H|{>  =  ta/2  (1-r2)1/2  in  units  of  y/7<T>. 


APPENDIX  B 


Figures  1,  2,  snd  3  show  the  probability  of  covering  ^  as  a  func¬ 
tion  of  w,  n,  k,  o2  .  These  probabilities  aay  be  derived  by  noting  that 
the  probability  of  coverage  P(|Y-Vj |<Hk>  Is  equal  to 


i»-iu 


P<_t«/2,k-1  - 


°k/vT  °k/yT 


-  t«/2/k-1> 


(1) 


where  «>k  Is  the  standard  deviation  of  the  k  batch  aeans.  For  all  k  < 
k*,  (Y-w)/(o|'/%/T)  Is  a  standard  noraal  randoa  variable  and  the  batch 
aeans  are  Independent.  Therefore  the  entire  center  expression  In  (1) 
follows  a  noncentral  Student's  t  distribution  with  k-1  degrees  of  free- 
doa  and  noncentrallty  paraaeter  «k«(|i-|i1>/(ok/v'E>.  Again  since  k£  k*, 
°k/vT  *  %/70r>,  yielding  Using  the  expression  for 
V0f>  In  the  Introduction,  we  have  «k»(|i-|i1>  <n  /co2))1/2.  Since  the 
probability  of  coverage  Is  the  saae  for  both  ±«k.  Figures  1,  2,  and  3 
are  drawn  using  |<k|  to  save  space.  The  required  noncentral  t  values 
aay  be  found  In  Owen  C11]1. 
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Figure  1.  Comparison,  by  number  o£  batches  k,  of  the  probability 
of  covering  points  6iAKx}  from  y  when  a=.10. 


Figure  2.  Comparison,  by  number  o£  batches  k,  of  the  probability 
of  covering  points  6A{X}  from  y  when  a=.05. 


Figure  3.  Comparison,  by  number  o£  batches  k,  of  the  probabilities 
of  covering  points  6/V{x}  from  y  when  a=.01. 

Figure  U.  Comparison  of  k=10  batches  with  the  limiting  case  k=*». 

Figure  5.  Cumulative  distribution  functions  for  the  lower  and 

upper  confidence  interval  bounds  for  k=10  independent 
and  normally  distributed  batch  means,  a=0.10. 


Figure  2.  Comparison,  by  number  of  batches  k,  of  the  probability 
of  covering  points  6/v{x)  from  p  when  a=.05- 


Comparison,  by  number  of  batches  k,  of  the  probabilities 
of  covering  points  6/v(X>  from  y  vhen  a=.01. 


Cumulative  distribution  functions  for  the  lover  and 
upper  confidence  interval  bounds  for  k=10  independent 
and  normally  distributed  batch  means,  a=0.10. 
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at  least  eight  to  ten  batches  are  used,  with  least  effect  on  confidence 
intervals  having  low  confidence  values.  With  the  effects  of  using  too 
few  batches  quantified,  a  simulation  practitioner  can  make  the  trade-off 
between  the  ease  of  using  very  few  batches  with  known  independence  and 
normality  versus  using  a  batching  algorithm  to  squeeze  some  remaining 
information  from  the  data.  For  researchers  developing  batching  algo¬ 
rithms,  the  results  are  useful  in  selecting  initial  batch  sizes.  The 
results  may  also  be  useful  in  the  context  of  using  independent  replica¬ 
tions  to  establish  confidence  intervals  cn  the  mean. 

Finally,  some  criteria  and  a  procedure  are  suggested  for  Monte 
Carlo  comparison  of  confidence  interval  procedures.  These  suggestions 
are  not  restricted  to  batch  mean  algorithms. 
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